decay rate
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Asia > China (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > Texas > Brazos County > College Station (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
A PID Controller Approach for Adaptive Probability-dependent Gradient Decay in Model Calibration
During model optimization, the expected calibration error tends to overfit earlier than classification accuracy, indicating distinct optimization objectives for classification error and calibration error. To ensure consistent optimization of both model accuracy and model calibration, we propose a novel method incorporating a probability-dependent gradient decay coefficient into loss function. This coefficient exhibits a strong correlation with the overall confidence level.
Error Analysis of Generalized Langevin Equations with Approximated Memory Kernels
We analyze prediction error in stochastic dynamical systems with memory, focusing on generalized Langevin equations (GLEs) formulated as stochastic Volterra equations. We establish that, under a strongly convex potential, trajectory discrepancies decay at a rate determined by the decay of the memory kernel and are quantitatively bounded by the estimation error of the kernel in a weighted norm. Our analysis integrates synchronized noise coupling with a Volterra comparison theorem, encompassing both subexponential and exponential kernel classes. For first-order models, we derive moment and perturbation bounds using resolvent estimates in weighted spaces. For second-order models with confining potentials, we prove contraction and stability under kernel perturbations using a hypocoercive Lyapunov-type distance. This framework accommodates non-translation-invariant kernels and white-noise forcing, explicitly linking improved kernel estimation to enhanced trajectory prediction. Numerical examples validate these theoretical findings.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate
Zhu, Meng, Xiao, Quan, Min, Weidong
Since the 21st century, artificial intelligence has been leading a new round of industrial revolution. Under the training framework, the optimization algorithm aims to stably converge high-dimensional optimization to local and even global minima. Entering the era of large language models, although the scale of model parameters and data has increased, Adam remains the mainstream optimization algorithm. However, compared with stochastic gradient descent (SGD) based optimization algorithms, Adam is more likely to converge to non-flat minima. To address this issue, the AdamNX algorithm is proposed. Its core innovation lies in the proposition of a novel type of second-order moment estimation exponential decay rate, which gradually weakens the learning step correction strength as training progresses, and degrades to momentum SGD in the stable training period, thereby improving the stability of training in the stable period and possibly enhancing generalization ability. Experimental results show that our second-order moment estimation exponential decay rate is better than the current second-order moment estimation exponential decay rate, and AdamNX can stably outperform Adam and its variants in terms of performance. Our code is open-sourced at https://github.com/mengzhu0308/AdamNX.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > China > Jiangxi Province > Nanchang (0.05)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > California (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > California (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Asia > China (0.04)
- North America > United States > Texas > Brazos County > College Station (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Canada (0.04)